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In four experiments, a total of 205 participants studied individual color patches and were 
given an old-new recognition test after a brief retention interval (0.5 or 5.0 s). The pattern 
of hue sensitivity (d') revealed hue memory shifting away from the prototype of the 
hue's basic color category. The shifts demonstrate that hue memory is influenced by 
categorization early in processing. The shifts did not depend on intentional categorization; 
the shifts were found even when participants made preference ratings at encoding rather 
than labeling judgments. Overall, we found that categorization and memory are deeply 
intertwined from perception onward. We discuss the impact of the results on theories 
of memory and categorization, including the effects of category labels on memory (e.g., 
Lupyan, 2008). We also put forward the hypothesis that atypical shifts in hue are related 
to atypical shifts that have previously observed in face recognition (Rhodes et al., 1987). 
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INTRODUCTION 

Memory, reasoning and categorization have traditionally been 
distinguished as separate topics and separate areas of research 
(Heit and Hayes, 2005; Heit et al, 2012; Hayes et al., 2014). It 
could be argued that categorization is either an automatic pro- 
cess as in categorical perception, where conscious reasoning is not 
recruited and any effect of categories on perception would appear 
to be due to the activation of categorical memory, or alternatively, 
categorization is an explicit process as in categorical decision tasks 
where a more deliberative process of reasoning may be at work. 
But the dichotomies of implicit vs. explicit and memory-based vs. 
reasoning-based categorization are too extreme; instead, a con- 
tinuum is likely to exist. We suggest that these cognitive activities 
are intimately intertwined. As Churchland (1981) pointed out, 
the terms we have from folk psychology, the ways we culturally 
divided cognitive processes prior to having scientific evidence to 
inform those divisions, are not necessarily sensible. As research 
advances in psychological and neurological understandings of 
cognitive processes, these traditional terms and divisions need to 
be broken down. The memory, reasoning and categorization dis- 
tinctions are losing their usefulness as separate constructs due to 
the likelihood of common underlying mechanisms. 

In this paper, we wiU be looking at a task that involves both 
memory and categorization. Experimental participants either 
label hues with basic color categories or make preference judg- 
ments about the hues. Then, memory for these hues is tested 
immediately. Participants have memory of categories and through 
categorization bring that memory to bear on newly formed 
encodings. In the way categorization and memory are often dis- 
cussed, categorization is the act of applying knowledge while 
memory is the substance of that knowledge. Yet using memory 
edits memory itself, as has been shown with memory recon- 
solidation (Nadel et al., 2012) and retrieval induced forgetting 



(Anderson et al, 1994). Memory and categorization cannot be 
treated as fully distinct cognitive topics but are interdependent. 

The distinction between perception and memory is also a 
vague and possibly false distinction. Perception is the transduc- 
tion of light, sound waves, chemicals, pressure, and heat into 
electrical signals in the nervous system. Memory refers to the stor- 
age of that information. Milliseconds after a stimulus has been 
experienced, researchers consider it remembered in iconic mem- 
ory, some of which passes on to working memory and possibly to 
long-term memory. There has been a debate about how far top- 
down conceptual knowledge can impact perception with some 
arguing that perception is cognitively impenetrable (Pylyshyn, 
1999) and others arguing that cognitive expectations affect per- 
ception very early in processing (Churchland, 1988; Hsieh et al., 
2010). 

One of the main phenomena of interest in the cognitive 
penetrability debate is categorical perception, where categorical 
knowledge affects how people perceive the surrounding world. 
Categorical perception has been examined in many domains 
including phoneme perception (Liberman et al., 1957), faces 
(Levin and Beale, 2000), and color (Winawer et al, 2007), (see 
Goldstone and Hendrickson, 2010 for a full review). In categori- 
cal perception, there is no deliberative reasoning — categorization 
is implicit and automatic, having an effect without people need- 
ing to actively decide on a category. Here, categorization appears 
to be based on implicit memory of frequent categorizations. 

Categorical perception has been explained diversely: as a pull 
toward the prototype (Lupyan, 2008), a truncation at the bound- 
aries of a category (Huttenlocher et al., 1991), or an expanding of 
perceptual space (Goldstone, 1994, 1998). None of these accounts 
of categories on perception would explain the novel result we 
present here: With rapid presentation and test of hues there is 
an atypical bias — a push away from the prototype, a pull toward 
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the boundary, or a seemingly incompatible change of perceptual 
space. While this result is novel for hue memory, a similar effect 
has been observed in immediate recognition of exaggerated faces 
(Rhodes et al., 1987). During perception, people appear to bring 
categories to bear on the content of perception but the influence 
is not uniformly one of attraction toward the prototype. 

Our own investigation was spurred by the argument that 
labels affect the memory of perception when the labels coin- 
cide with perception (Lupyan, 2008). Specifically, for an effect 
that was metaphorically referred to as a representational shift, it 
was claimed that labels cause memory traces to be prototypically 
shifted from the raw percept by exerting a top-down influence 
of the labeled category on the perceived item. The label acti- 
vates the category prototype, which interacts in real time with 
the bottom up perception resulting in a muced encoded mem- 
ory trace. Specifically, these experiments looked at whether there 
was an advantage to remembering objects that were labeled or 
judged in terms of preference (liking). The participants either 
labeled object categories including chairs, lamps, and tables (two 
categories per experiment) or made a like/dislike preference judg- 
ment in alternating blocks during study. Participants only saw 
the objects for 300 ms and had 700 ms to respond to discourage 
labeling in the preference judgment trials. After all study trials, 
participants were then tested on their memory for the items using 
the original objects as well as a matched lure for each original item 
in a new/old recognition task. Participants less accurately remem- 
bered previously seen items if they had been categorically labeled, 
which was taken as evidence that the representation of the labeled 
objects was shifted — it no longer matched up to the originally 
perceived item. Other researchers (Richler et al., 201 1; Blanco and 
Gureckis, 2013) have taken issue with this interpretation in terms 
of representational shift, instead suggesting that perceived items 
are remembered better because preference judgments require a 
greater depth of processing than category labeling. They intro- 
duced non-labeling conditions such as chair orientation (Blanco 
and Gureckis, 2013) and screen position (Richler et al., 2011) 
that only require superficial processing of the objects. These con- 
ditions performed similarly to the category labeling condition 
introduced by Lupyan (2008 ) . To these researchers, the strength of 
the memory accounts for the differences in recognition memory. 

The controversial claim from Lupyan (2008) that there are 
prototypical representational shifts has not been demonstrated 
directly. Only a decrease in accurate recognition of previously 
seen items has been shown, which could mean a shift toward the 
prototype, away from the prototype, or simple non-directional 
forgetting. Hence, previous research on representational shifts 
has not provided clear evidence that representations have shifted, 
much less in what direction they have shifted. To get at this 
question we will present a paradigm that is a conceptual repli- 
cation of Lupyan (2008) using the same judgment conditions at 
target presentation, category labeling and preference judgments, 
and using a similar memory test, same/different judgments rather 
than new/ old recognition judgments. The main differences are in 
stimuli and timing. We present the targets as well as four matched 
lures varying systematically in category typicality and distance 
from the target. These stimuli will allow us to quantify the direc- 
tion and magnitude of any representational shifts that occur. If a 



shift is in the typical direction as predicted by Lupyan (2008), the 
new array of test stimuli will allow it to be seen. 

Previous work on representational shifts has examined mem- 
ory for objects such as lamps and chairs. In our own work, we 
focus on color space, which is more quantifiable and better- 
defined than object space. Color is a continuous uniform physical 
space made up of different wavelengths of light. Color is also a 
rich psychological space that is divided into superordinate, basic, 
and subordinate categories. There are focal or prototypical colors 
within categories as well as boundaries where one category meets 
the next that are shared amongst speakers of the same language, 
and to some extent across languages (Berlin and Kay, 1969; Regier 
and Kay, 2009). As such, color space is a fertile testing ground for 
examining how categorical knowledge distorts basic perception. 

Taking account of the psychological landscape of the color 
domain and people's ability to detect fine alterations from one 
color stimulus to the next, we were able to directly test the color 
that has been encoded through a recognition test, and how dif- 
ferent, if at all different, the encoded color is from the originally 
presented color. By moving from object space to hue space, we 
constrain the potential directions of memory shifts toward or 
away from the prototype. By testing memory of the target as 
well as four matched lures differing in distance and direction in 
hue space relative to the prototype, we have the opportunity to 
measure the sensitivity (d') of hue memory at different locations 
relative to the target. Sensitivity serves as a measure of confusabil- 
ity and strength of confidence in having seen something at the 
point in hue space. Where d' is high, people can reject lures that 
they have not seen. Where d' is low, this means that lure items 
nonetheless seem relatively familiar, as if there is a false mem- 
ory representation at that point in hue space. Moreover, if d' is 
lower in one direction, relative to the prototype, compared to the 
other direction, this implies that the representation in memory 
has shifted directionally. Our paradigm allows us to see the shift 
as well as quantify its direction and strength. 

Additionally, the representational shift hypothesis focuses on 
encoding. However, the paradigm used in the original paper 
(Lupyan, 2008) as well as the versions of the paradigm used in 
subsequent work (Richler et al, 201 1, 2013; Blanco and Gureckis, 
2013) have an extended study phase presenting all items twice 
prior to a test phase of all items resulting in a delay of min- 
utes between presentation and test. This format does not isolate 
effects down to the time of encoding. Our paradigm focuses 
on immediate memory to more closely address encoding. We 
use a same/different judgment as the memory test either 500 ms 
after target presentation (Experiments lA and B) or 5000 ms 
after target presentation (Experiment 2A and B). This prevents 
interference of other hues on the representation of the key item 
between study and test. 

We now present four experiments, two main experiments 
and two direct replications. Experiment lA was designed to 
test the memory for a color soon after encoding. The delay 
between original presentation and the same/different judgment 
was 500 ms. We found an atypical shift — unexpected based on 
previous research which had suggested that the shift would be 
toward the prototype — with no difference between judgment 
conditions. In Experiment 2A, the delay was increased to 5000 ms 



Frontiers in Psychology | Cognitive Science 



July 2014 I Volume 5 | Article 796 | 2 



Kelly and Heit 



Atypical representational shifts 



to test if the predicted prototypical shift could be observed at a 
longer delay and if there was an effect of judgment condition that 
developed over time. The atypical shift and lack of judgment con- 
dition effect were reproduced. Due to some participants being 
excluded from Experiments lA and 2A as well as the unexpected 
direction of the representational shift, we conducted direct repli- 
cations of both experiments with higher power, in Experiments 
lBand2B. 

EXPERIMENT 1 A 

In a conceptual replication of Lupyan (2008), we presented par- 
ticipants with hues to judge either by category or by preference. In 
a departure from the previous paradigm that had separate study 
and test phases, test occurred immediately after study within 
a trial. Given that the representational shift hypothesis is one 
of shift at encoding, shifts should be immediately detectable. 
Additionally, rather than having one matched lure for each stud- 
ied item, there were four lures spanning both potential directions 
of movement relative to the prototype and two distances in hue 
space. Using sensitivity (d') as the dependent measure, we deter- 
mined whether memory shifted at all, if it shifted toward the 
prototype or away from it, and the approximate distance of the 
shift in hue space. In particular, lower d' values indicate a higher 
false-alarm rate to lures. So, for example, if representations shift 
toward the prototype, there will be greater likelihood of false- 
alarming to typical lures compared to atypical lures, and d' will 
be lower for typical items than for atypical items. 

METHODS 
Participants 

Thirty-six students at the University of California, Merced par- 
ticipated in these experiments for course credit. All partici- 
pants reported normal vision and normal color vision. Their 
color vision was tested using the CITY colorblindness test 
(City University, 2002) following the main experiment. The 
research was approved by the University of California, Merced 
Institutional Review Board and verbal consent was obtained from 
each participant. 

IVIaterials 

The color stimuli were calculated in CIE L*CH color space then 
translated to CIE L*ab color space. The stimuli were from two 
color categories, red and green. Focal colors, treated as the cat- 
egory prototypes, were obtained from Sturges and Whitfield 
(1995). Saturation and brightness were held constant at the focal 
saturation and focal brightness. Within these color categories, 
four target colors were selected for a total of eight target colors 
across the two categories. All target hues were of similar typicality 
relative to the prototypes though explicit typicality measures were 
not collected. The targets were neither extremely typical nor atyp- 
ical of their color category. From each of the target colors, four 
variants were created, two closer to the prototype and two further 
away from the prototype. These variants served as the recogni- 
tion test lures. The hue distance between each hue in the set of 5 
test hues, the target and four lures, was equal. The hue distances 
were normalized for the different color spaces with green encom- 
passing a larger number of degrees than red. All variants within a 



set did not cross the prototype or the color category boundaries. 
The calculated colors can be found in Supplementary Materials: 
Appendix A. 

Dell Ultrasharp U2410 monitors were used to display the stim- 
uli and the color calibration profiles were created using a X-rite 
il Display Pro color calibrator. The stimuli were created using 
Adobe Photoshop to convert the calculated colors to a RGB device 
specific color profile for each monitor, resulting in uniform pre- 
sentation across the three monitors. Using a photometer, the 
experimental cubicles were found to have similar intensities of 
light. 

Procedure 

There were two target judgment conditions. Participants chose 
between the basic color categories, green and red, for the cate- 
gorical judgment and between like and dislike for the preference 
judgment. The categorical judgment response keys were counter- 
balanced across participants whereas the like/dislike response keys 
were in left to right order as it is a natural mapping. The second 
judgment of each trial was a same/different judgment. The par- 
ticipant was to judge whether the second hue presented during 
the trial matched the first hue that had elicited the category or 
preference judgment. 

Each trial consisted of a fixation cross (1500 ms), the target 
hue (300 ms), a question mark eliciting a button push judg- 
ment (up to 700 ms), a blank screen (500 ms) and a response 
screen with a test hue also eliciting a button push judgment (up 
to 4000 ms). The participant's response immediately ended the 
response-eliciting screens. The trials were portioned into blocks 
of 80 trials consisting of all 8 target colors being paired with their 
5 test hues (the original hue and the 4 lures) for 2 trials each. 
Each block had one type of judgment (categorical or preference) 
being the response to the question mark. There were 4 blocks, 
alternating between the judgment types. The order of the blocks 
was counter-balanced across participants. 

Prior to the main trials, participants were trained on 6 yel- 
low and purple stimuli trials, then completed 2 short blocks of 
10 trials, one block of category judgments and one block of pref- 
erence judgments to allow participants to get into the rhythm of 
responding quickly before the key trials began. These short blocks 
contained the red and green stimuli and were not indicated to be 
practice trials to the participants. 

RESULTS AND DISCUSSION 

We excluded 12 participants, 8 for low color naming accuracy 
(<80% correct) in spite of color vision screening, as well as 4 for 
failing to follow instructions. Failing to follow instructions in this 
and subsequent experiments included a very high rate of "like" 
judgments >90%, a low response rate at either the judgment or 
test portion of a trial (<80%), or always responding with "same" 
at test. Including these participants does not change the pattern 
of results. Here, we report results based on 24 remaining partici- 
pants. In this and subsequent experiments, individual trials were 
excluded if categorization at study was incorrect or participants 
did not respond at both the study and test portions of a trial. 

The analyses relied on the d' measure of sensitivity used in 
signal detection theory (Stanislaw and Todorov, 1999). The d' 
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measure has been used in recent studies of representational shift 
(Richler et al., 2011, 2013; Blanco and Gureckis, 2013) though 
false alarm rates alone were used in the original paper (Lupyan, 
2008). Compared to analyses based on raw scores such as false 
alarm rates, d' not only takes account of variations in hit rate 
but has the advantage of being a better match for the under- 
lying Gaussian nature of recognition data (see Macmillan and 
Creelman, 2004, for a general overview, and Heit and Rotello, 
2014, for a more recent discussion). For the analyses, we calcu- 
lated two overall hit rates per participant one for each condition 
and used these along with the four lure false alarm rates per condi- 
tion to calculate d' values. By using d' rather than raw false alarm 
rates, we are controlling for the general response rate of an indi- 
vidual in a condition in addition to calculating how well they can 
differentially respond to the target vs. the lures. 

In this case, we had one set of test items that were the same 
as the originally presented item. Same judgments on these items 
were considered hits and different judgments were considered 
misses. We also had 4 sets of items that were different from the 
original hue varying in hue space distance (1 step or 2 steps) 
and in direction of typicality (more typical or more atypical of 
the color category). Same judgments in response to these items 
were false alarms and different judgments were correct rejections. 
We calculated d' by subtracting the z-score of the proportion of 
false alarms from the z-score of the proportion of hits. In the case 
of proportions of 0 and 1, z-scores cannot be calculated due to 
the normal curve expanding to infinity at its tails. We used the 
standard correction of including or excluding half a hit or half 
a false alarm where appropriate (Snodgrass and Corwin, 1988). 
The hit and false alarm rates for all experiments are reported in 
Supplementary Materials: Appendix B. The d' measure was cal- 
culated for each of the four levels of the test hue variations and 
for each of the two judgment conditions by subject. 

A significantly lower level of sensitivity, namely a lower d' , was 
taken to be evidence of the direction of a shift. Lower d' corre- 
sponds to more false alarms or more non-targets confused to be 
the same as the target. So, for example, if there is lower d' as a 
result of more false alarms for prototypical items, this suggests 
a prototypical shift — the memory traces are more similar to the 
more typical test items than the less typical items. Likewise, lower 
d' for atypical items suggests an atypical shift. No difference, or 
a symmetrical sensitivity, would imply that memory does not 
shift relative to the category typicality gradient. The representa- 
tional shift hypothesis (Lupyan, 2008) suggests that there should 
be an interaction of typicality and condition with a lower d' for 
the more typical lures than the atypical lures only in the cate- 
gory labeling condition. There should be no typicality effect, or 
at least a smaller effect, for the preference condition. The depth 
of processing account (Richler et al, 2011; Blanco and Gureckis, 
2013) predicts a main effect of condition with no typicality effect; 
the sensitivities should be symmetrical. The key prediction of 
this account is more accurate memory, a higher d' , for items in 
the preference condition which is suggested to be more deeply 
processed than the items in the category labeling condition. 

To test whether there was a difference in d' by condition 
(color vs. preference judgment), distance from the original hue 
(1 step vs. 2 steps in hue space) or by direction of typicality 



(typical vs. atypical), we ran a 2 x 2 x 2 ANOVA. The results 
can be seen in Figure 1. There was no effect of condition, with 
labeling the category or making a preference judgment not dif- 
ferentially affecting hue sensitivity. There was a main effect of 
distance, 23) = 29.59, p < 0.001, = 0.184. The distance of 
2 units had a d' mean of 0.672 while the d' mean of the distance 
of one unit was 0.259, indicating that there was less sensitivity 
to a hue change when the test hue was closer to the original hue 
as one would expect. This finding indicates that the hues that are 
less different in color space are less detectable. Therefore, any shift 
that has taken place with the color hues is subtle and within a few 
degrees of hue space. 

The key finding was a main effect of typicality, f(i, 23) = 12.92, 
p < 0.01, T]^ = 0.100. The d' mean of more typical test hues was 
0.612 and the d' mean of hues less typical of the color category was 
0.318, indicating that participants were less sensitive to changes in 
hue if the hue was atypical of the color category. In other words, 
participants were more likely to false-alarm to atypical test items 
than to typical test items. Based on prior theoretical work, the 
prediction was actually the opposite, that there would be less 
sensitivity and therefore a representational shift in the typical 
direction. 

Additionally, there were two marginal components of the 
ANOVA, an interaction of condition and distance [-F(i,23) = 3.11, 
p = 0.091] and an interaction of distance and typicality [-F(i.23) = 
3.11,p = 0.112]. Due to the unexpected result, the marginal find- 
ings, as well as the number of participants excluded resulting in a 
small final sample size, we view conducting a direct replication as 
important to having confidence in our results (see Cesario, 2014 
for discussion of the importance of direct replication). 



Experiment 1a: Sensitivity of Hue Discrimination 




0 - 



-2 -1 +1 +2 

Hue Steps Toward the Focal Hue 

FIGURE 1 I Sensitivity of hue discrimination in Experiment 1A. Lower 
d' values indicate more confusion of hues at a particular level of the hue 
factors, here denoted with sign for toward or away (— ) from the 
prototype and with number (1 , 2) for steps distant from the original hue. 
Here, d' is lower in the atypical ) direction than in the prototypical. It is 
also lower at 1 step removed from the originally presented hue than at 2 
steps. There is no difference by condition. Error bars represent the standard 
error of the means. 
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EXPERIMENT IB 
METHODS 

Sixty-seven students participated using the same criteria as in 
Experiment lA. All materials and procedures were the same. 

RESULTS AND DISCUSSION 

We excluded 22 participants, 10 for low color naming accuracy 
(<80% correct) in spite of color vision screening, as well as 12 for 
failing to follow instructions. Including these participants does 
not change the pattern of results. Here we report findings based 
on 45 remaining participants. 

The results are shown in Figure 2. Conducting the same 
2x2x2 ANOVA on the d! scores, there was again no effect 
of condition. We replicated the significant main effect of distance, 
_F{i = 83.58, p < 0.001, T]^ = 0.242 indicating that again it 
was easier to distinguish items further from the original with the 
d! mean of 2 units being further from zero at 0.652 than the 
mean of the 1 unit hue distance items at 0.203. We also replicated 
our typicality main effect, _F(i_ 44) = 10.39, p < 0.01, i]^ = 0.048, 
with the atypical direction mean of 0.336 and the typical direc- 
tion mean of 0.519. Again, participants were less sensitive to 
changes in hue if the change was away from the category pro- 
totype. None of the interactions were significant including the 
previously marginal results. 

With this replication we can have more confidence in con- 
cluding that the representational shift is occurring in the atypical 
direction. 

EXPERIMENT 2A 

The representational shifts at a half second delay between pre- 
sentation and test in Experiments lA and B were in the atypical 



direction. Lupyan (2008) had argued that labeling should have an 
effect of increasing the typicality of a representation at encoding. 
Here, we increased the delay between presentation and test to 5 s 
to see if a labeling effect or a reversal in the direction of the shift 
emerged with more processing time. 

METHODS 

Forty students were recruited as in the other experiments. The 
materials and procedure were the same as in Experiments lA 
and B with the exception of increasing the delay of the blank 
screen between the original hue presentation and the test hue 
presentation from 500 ms to 5000 ms. 

RESULTS AND DISCUSSION 

We excluded 21 participants for low color naming accuracy 
(<80% correct) in spite of color vision screening. Including these 
participants does not change the pattern of results. Here, we 
report results based on 19 remaining participants. 

We conducted a 2(color vs. preference judgment) x 2(1 
hue step vs. 2 hue steps) x 2 (typical vs. atypical direction) 
ANOVA on d' as in the previous experiments. Figure 3 shows 
the d' means and error at each level of the ANOVA. There 
was no effect of condition. We found a main effect of distance 
ig) = 54. 18,p < 0.001, Ti^ = 0.297] with the 2 units of hue 
distance being more detectable (mean = 0.513) than the 1 hue 
step (mean = 0.057). We also again found a main effect of typical- 
ity [-F{i, 18) = 21.99, p < 0.001, Ti^ = 0.204) with more atypical 
hues being less detectable (mean = 0.1) than more typical hues 
(mean = 0.47). No interactions were significant. 

As in Experiments lA and B, there was no effect of condi- 
tion and a significantly lower sensitivity in the atypical direction. 



Experiment 1b: Sensitivity of Hue Discrimination 



1.2 - 



Judgment Type 
Category 




Hue Steps Toward the Focal Hue 

FIGURE 2 I Sensitivity of hue discrimination In Experiment 1B. Lower 
d' values indicate more confusion of Inues at a particular level of the hue 
factors, here denoted with sign for toward (-I-) or away (— ) from the 
prototype and with number (1 , 2) for steps distant from the original hue. 
Here, d' is again lower in the atypical (— ) direction than in the prototypical 
direction. It is also lower at 1 step removed from the originally presented 
hue than at 2 steps. There is no difference by condition. Error bars 
represent the standard error of the means. 



Experiment 2a: Sensitivity of Hue Discrimination 



1.2 - 
1 - 



Judgment Type 
Category 
- - Preference 




Hue Steps Toward the Focal Hue 

FIGURE 3 I Sensitivity of hue discrimination In Experiment 2A. Lower 
d' values indicate more confusion of hues at a particular level of the hue 
factors, here denoted with sign for toward or away (— ) from the 
prototype and with number (1,2) for steps distant from the original hue. 
Here, d' is again lower in the atypical (— ) direction than in the prototypical 
direction. It is also lower at 1 step removed from the originally presented 
hue than at 2 steps. There is no difference by condition. Error bars 
represent the standard error of the means. 
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pointing again to a shift away from the prototype. However, there 
was again a relatively high number of exclusions resulting in a low 
final sample size. We again conducted a direct replication. 

EXPERIMENT 2B 
METHODS 

We recruited 62 participants using the same criteria as the previ- 
ous experiments. All materials and procedures were the same as 
in Experiment 2A. 

RESULTS AND DISCUSSION 

We excluded 14 participants, 7 for low color naming accuracy 
(<80% correct) in spite of color vision screening and an addi- 
tional 7 for failing to follow instructions. Including these par- 
ticipants does not change the pattern of results. Here, we report 
results based on 46 remaining participants. 

We again conducted the same 2x2x2 ANOVA on the d! 
measurements (Figure 4). There was a main effect of judgment 
condition in this experiment unlike the 3 others, -F(i_ 45) = 6.26, 
p < 0.05, Ti^ = 0.067. Preference-judged items (mean = 0.249) 
were less detectably different from the original than color judged 
items (mean = 0.500). In other words, there were more false 
alarms, and more shifting, for preference judgments than for 
labeling judgments. There was also a main effect of distance 
[_F(i = 30.60,^ < 0.001, Ti^ = 0.103], replicating the finding 
that items 2 hue steps distant from the original (mean = 0.496) 
are more detectable than items 1 step distant (mean = 0.252). 
We also replicated our previous main effect of typicality 
45) = 7.48, p < 0.01, Ti^ = 0.043] with more atypical items 
being less detectable (mean = 0.293) than more typical items 
(mean = 0.456). Again, no interactions were significant. 



Whereas d' is a direct measure of discrimination, for com- 
pleteness we conducted post-hoc analyses of the raw false alarm 
rates using the same 2x2x2 ANOVA. (See also Supplementary 
Materials: Appendix B). In all of the previous experiments 
reported here, the pattern of results was consistent with the 
ANOVA conducted on d' scores. In this experiment, while the 
false alarm rates of the foils continued to show the typical- 
ity effect, 45) = 8.688, p < 0.005, and the distance effect, 
f(i 45) = 31.1, p < 0.001, the main effect of condition was not 
significant when false alarm rate was the dependent variable, 
_F(i 45) = 0.006, p = 0.939. The main effect of condition in the d' 
analysis was driven by taking into account the participants' high 
labeling condition hit rates. 

Overall, we replicated the direction of the shift. That is, extend- 
ing the time between study and test from half a second to 5 s 
did not change the atypical nature of the representational shift. 
Here we observed a difference between conditions for the first 
time. Category-labeled hues were more distinguishable from the 
original than those that were preference-judged. We hesitate to 
draw too strong a conclusion from this difference since it was 
only found in one of the four experiments and the false alarm 
analysis implied that the condition effect depends on the hit rate 
rather than the false alarm rate. Interestingly however, with only 
a main effect and no significant interaction, the category label- 
ing increased sensitivity to hue differences overall, rather than in 
a particular direction toward or away from the prototype. 

GENERAL DISCUSSION 

In our experiments, we found less sensitivity to differences 
between a studied hue and an unstudied test hue that is less typ- 
ical of the category. In other words, participants were more likely 
to false-alarm to atypical items than to typical items. We take this 
as evidence that there are representational shifts and that they are 
away from the prototype. Additionally, there may be a judgment 
condition effect that emerges over time, with category labeled 
hues being more easily detected as different from the original hue. 
There is no interaction of the condition and the typicality direc- 
tion indicating that while category labeled hues might be more 
detectable, it is not due to shifting. Instead, we can speculate 
that labeling a color allows participants to reduce bias in either 
direction equally. 

REPRESENTATIONAL SHIFTS, DEPTH OF PROCESSING, OR TRANSFER 
APPROPRIATE PROCESSING 

The representational shift hypothesis (Lupyan, 2008) predicted 
that (1) there are representational shifts, (2) the shifts happen 
at encoding, (3) shifts are in the prototypical direction, and (4) 
category-labeled items are more strongly shifted. Our paradigm 
was able to show that there are representational shifts and that 
they occur very quickly, possibly at encoding. However, we 
found atypical shifts instead of the predicted prototypical shifts. 
Additionally, we found shifts for both category labeling judg- 
ments and preference judgments, indeed in Experiment 2B with 
stronger shifts in the preference condition. Given these results, we 
question whether the representational shift hypothesis as detailed 
by Lupyan is the appropriate explanation here. 

Depth of processing (Craik and Tulving, 1975; Richler et al., 
2011; Blanco and Gureckis, 2013) does not fit the present results 
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FIGURE 4 I Sensitivity of hue discrimination in Experiment 2B. Lower 
d' values indicate more confusion of hues at a particular level of the hue 
factors, here denoted with sign for toward (-I-) or away (— ) from the 
prototype and with number (1,2) for steps distant from the original hue. 
Here, d' is again lower in the atypical (— ) direction than in the prototypical. 
It is also lower at 1 step removed from the originally presented hue than at 
2 steps. There is a difference by condition with less sensitivity in hues that 
were preference-judged over hues that were category-labeled. Error bars 
represent the standard error of the means. 
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either. If preference judgments require greater depth of processing 
and depth of processing leads to better memory, then preference 
judgments should lead to more sensitivity to hue changes. In 
fact, we found that preference judged hues led to either indistin- 
guishable sensitivity or less sensitivity than category-labeled hues. 
Therefore, the depth of processing account of the representational 
shifts is not satisfactory either. 

Instead, we point to transfer appropriate processing (Morris 
et al, 1977) as a potential framework in which to understand 
the results. In transfer appropriate processing, relevant details 
to the task at hand are processed with more depth than details 
that are less appropriate at the time of encoding. Perhaps pref- 
erence judgments have an inherently greater depth of processing 
compared to basic categorization, but the content of that depth 
is not necessarily what is needed for greater sensitivity in the 
present task. Directing processing into comparison of the hue 
against preferences and making a valence judgment may distract 
from the encoding of the exact hue while color labeling concen- 
trates processing on the appropriate aspect of the hues for greater 
sensitivity. Rather than greater raw processing, the right kind of 
processing leads to more exact memory. 

ATYPICAL SHIFTS 

Counterintuitively, the representational shifts at a rapid test pace 
were in the atypical direction. Previous research on categorical 
knowledge effects on memory mostly suggests that if memory 
is altered systematically from the original percepts it should be 
in a prototypical direction (e.g., Heit, 1997). Categories serve to 
generalize our knowledge and to highlight similarity among dis- 
tinct exemplars. What purpose could be served by atypical shifts? 
While we will not claim to have a final answer to this question, we 
speculate that it is related to perceptual expertise processes. 

A domain where a similar atypical representational shift has 
been found is in recognition of faces. This has been called a dis- 
tinctiveness effect (Rhodes et al, 1987). Participants were faster 
to recognize exaggerated faces over the original facial proportions 
that were in turn recognized faster than more generic versions 
of the faces. The authors argued the most distinctive features 
of a face are what are encoded into memory with the more 
generic portions not encoded as strongly. Gist memory (Reyna 
and Brainerd, 1995) can be used to fill in the representation. 
When a person then goes to use the encoding to recognize a face, 
the exaggerated face matches the distinctive features better than 
the true face. 

Hue is much less complex than faces — ^just a single dimen- 
sion of a single feature — and yet, we found a similar bias away 
from the prototype. Rather than encode the one feature veridi- 
cally, participants appear to have encoded a shifted hue. Perhaps 
the mechanism that underlies the caricature effect is a magnifica- 
tion of the atypical effect we observed through multiple features 
all moving atypically. While the details of how this one feature 
case relates to the more complex case of faces is unclear, our 
results call into question the explanation that we simply encode 
the distinctive features of a face as they are without the more 
generic aspects to achieve an exaggerated encoding. 

There has been a long debate over whether faces are in some 
way special in object processing (Farah, 1996; Kanwisher et al.. 



1997; Tovee, 1998; Gauthier and Logothetis, 2000; McKone et al, 
2007). The majority of humans, those without a specific deficit 
called prosopagnosia, are considered to be experts at facial recog- 
nition. On the domain general, expertise explanation of face 
processing effects, visual object domains other than faces such as 
cars and birds can be processed in similar ways with experience 
(Gauthier et al, 2000). Bukach et al. (2006) advocated the use 
of an expertise framework to understand category specialization. 
Colors are generally associated with a basic level of categorization 
(Berlin and Kay, 1969; Rosch, 1975). When forced by a task to 
make fine subordinate distinctions, a different strategy appears 
to emerge. Movement in the atypical direction in our experi- 
ments was relative to basic categories. Perhaps the fine-grained 
categorization process is overcompensating for a more natural 
generalization and homogenization process that occurs when the 
participant is functioning at the basic category level. The detailed 
memory of hue demanded of participants in this task was not a 
typical activity. But for faces, fine-grained distinctions are a basic 
need. This may indicate that expert processing techniques can 
be flexibly recruited in real-time to a task and do not depend 
exclusively on trained distinction making within a domain. 

ONLINE ROLE OF LABELS 

While the prototypical shifts predicted by the representational 
shift hypothesis were not found and the mechanism underly- 
ing the shifts proposed by that hypothesis was not supported, 
the larger framework of the label-feedback hypothesis (Lupyan, 
2012) is not something we are looking to challenge. In the label- 
feedback hypothesis, language is a pervasive online influence on 
cognition in the tradition of Whorf (1956). Language is an inher- 
ent part of the complex multidimensional system of the normal 
human adult mind, not something that is switched on and off 
depending on the task. Instead, labels serve to up-regulate the 
influence of linguistic knowledge online while verbal interference 
down-regulates that influence. 

One interpretation of the current results would be compatible 
with the label-feedback hypothesis. Namely, the influence of the 
labels on memory occurs online and serves to increase the sen- 
sitivity of an individual's ability to detect change in the labeled 
category. In this account, the up-regulation of language's influ- 
ence in the labeled case allows processing to focus in on the hue 
resulting in more accurate memory. Sloutsky (2003) discussed the 
role labels have in directing attention during category learning 
to relevant similar features among items in a labeled category. 
Extending the logic from learning to use of learned categories, if 
labels are features of the category, invoking them will draw atten- 
tion to the dimension(s) on which the category similarity and 
distinctions are judged. While labeling did not have the effect of 
pulling items toward the category prototype in this task, labeling 
could have a more general modulating influence on encoding. 

An alternative explanation could be somewhat consistent with 
the label-feedback hypothesis but from the opposite direction. 
The basic categories of color could be essentially automatic 
(Grill-Spector and Kanwisher, 2005) having language's influence 
close to ceiling. The preference judgments may serve to distract 
processing from reaching the level of depth it naturally would 
regardless of the color label because preference valence needs 
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to be the focus of directed attention (Simons, 2000). Preference 
judgments would be down-regulating the influence of linguistic 
category knowledge. 

We are agnostic given the present evidence whether labeling 
has an added effect, the preference judgments have a distracting 
one, or some combination of the two is at play. Disambiguating 
the competing interpretations would be an interesting direc- 
tion for further research. Either way, attention appears to be 
directed at the relevant dimension for the memory test when col- 
ors are labeled while attention is on a different dimension when 
preferences are being elicited. 

CATEGORICAL PERCEPTION 

Our results can also speak to recent developments in the categor- 
ical perception literature. Categorical perception is the effect of 
enhanced discrimination performance when the items being dis- 
criminated cross category boundaries. This has been attributed to 
changes in perception (Harnad, 1987), particularly the enhanced 
distinctiveness of learned category differences (Goldstone, 1994). 
Roberson and colleagues (Roberson et al., 2009; Kikutani et al, 

2010) proposed a different account suggesting that category labels 
play a crucial role in categorical perception, with different labels 
facilitating greater accuracy and faster reaction times. 

Hanley and Roberson subsequently updated their account. 
Conducting a reanalysis of a series of two alternative forced 
choice categorical perception tasks discriminating between colors 
or faces (Hanley and Roberson, 2011), they found an asymmetry 
among the within trials that are traditionally treated as a sin- 
gle condition. On trials where the target item was more typical 
of the category, or a better exemplar, compared to the foil, par- 
ticipants had a similar proportion correct to between category 
trials. On trials where the target item was more atypical of the 
category than the foil, a poor exemplar, participants performed 
much worse. These poor exemplar trials account for the overall 
categorical perception effect. Hanley and Roberson account for 
this finding through the relative reliability of labels applied to the 
items. If participants labeled a hue blue when the hue was on 
its own, the participant who remembers "blue" rather than the 
actual color wiU be more likely to choose the better example of 
that category — even if the hue they saw was not the best exam- 
ple of the category at test. The items around a boundary are more 
ambiguous and can be labeled in different ways based on context. 

Hendrickson et al. (2012) use a category learning paradigm to 
investigate the label ambiguity hypothesis (Hanley and Roberson, 

2011) . They find that there is a pre-categorization asymmetry 
in addition to the enhanced effect after category learning. If the 
asymmetry exists prior to learning labels, label ambiguity alone 
cannot account for the asymmetry. They put forward an account 
based on unsupervised learning of clusters regardless of labeling. 

Our experiments did not contain a classic categorical percep- 
tion task since we only conducted within category trials. We also 
did not utilize a two alternative forced choice paradigm. However, 
the same/different task similarly requires participants to compare 
their memory for a stimulus to the test items. Rather than use 
all items as both target and foil, we had set targets with foils in 
both the typical and atypical direction. Therefore, each target hue 
was both a good exemplar (atypical trials) and a poor exemplar 



(typical trials) compared to the current foil. The research above 
would suggest that there should be enhanced performance on 
good exemplar, or atypical trials. This is the opposite of what we 
found. Sensitivity to differences decreased when the test hue was 
less typical of the category. The label ambiguity hypothesis cannot 
account for this result. 

CONCLUSION: MEMORY, CATEGORIZATION AND 
REASONING ARE INTERTWINED 

We examined the effect of active category labeling on hue mem- 
ory creation. Memory even at 500 ms after initial perception 
is affected by categorical structure, regardless of active label- 
ing. Given the short time scale and the reliable influence of 
category typicality, it seems safe to conclude that memory and 
categorization are inextricably intertwined in this task. While our 
experiments did not look at learning, labels are known to facili- 
tate the learning of categories (Lupyan et al., 2007; Sloutsky and 
Fisher, 2012), which is considered to be a reasoning process. In 
so far as categorization is based on past experience, is ubiqui- 
tous in its influence on memory, and is developed at least in part 
through reasoning, the historically distinct topics of memory, cat- 
egorization, and reasoning would appear to be comprised of com- 
mon elements. As the topics continue to be considered together, 
the interrelations and underlying processes will become more 
clear. 
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